-
Notifications
You must be signed in to change notification settings - Fork 1.1k
perf: improve field indexing in JSON StructArrayDecoder (1.7x speed up) #9086
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
scovich
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I understand the indexing code well enough to say whether that part is correct, but the idea of using an optional index for field name lookups makes a lot of sense to me.
| } | ||
| } | ||
|
|
||
| fn build_field_index(fields: &Fields) -> Option<HashMap<String, usize>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
qq: Do lifetimes coincide so that we could return Option<HashMap<&str, usize>> instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the lifetimes do coincide. we can use HashMap<&'a str, usize> by taking fields: &'a Fields as a parameter, which avoids the self-referential struct problem. However, this would require threading the lifetime parameter <'a> through the entire decoder system across many files. Since the lookup performance is identical, I don’t think it’s worth the added complexity.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it would be a good follow on PR
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
arrow-json/benches/reader.rs
Outdated
| use std::fmt::Write; | ||
| use std::sync::Arc; | ||
|
|
||
| fn build_schema(field_count: usize) -> Arc<Schema> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you please add some comments here with an example of what this code does / what patterns of input it creates?
Also, it would help me to reproduce your results if you could make a separate PR with the benchmarks (so I can compare main to the PR)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
separate benchmark here
| } | ||
| } | ||
|
|
||
| fn build_field_index(fields: &Fields) -> Option<HashMap<String, usize>> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe it would be a good follow on PR
|
run benchmark json-reader |
|
🤖 Hi @alamb, thanks for the request (#9086 (comment)).
Please choose one or more of these with |
7e3077e to
06ded8b
Compare
|
run benchmark json-reader |
|
🤖 Hi @Weijun-H, thanks for the request (#9086 (comment)). |
|
🤖 |
|
🤖: Benchmark completed Details
|
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Weijun-H -- I think this PR is a nice improvement. I have some suggestions on how to make it faster and improve the comments, but overall very nice 👍
| is_nullable: bool, | ||
| struct_mode: StructMode, | ||
| field_name_to_index: Option<HashMap<String, usize>>, | ||
| child_pos: Vec<u32>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a comment that explains what child_pos is? It isn't clear here (the idea of caching rather than recreating it looks good though)
Specifically I think it is important to document what is stored at each index (e.g. each index the tape position of at field_idx * row_count + row)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
renamed and commented in df9e710
| )) | ||
| })?; | ||
| } | ||
| self.child_pos.resize(total_len, 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it would set some elements to zero twice -- I think you can get the same result without the extra setting via
self.child_pos.clear();
self.child_pos.resize(total_len, 0);Also, I think resize calls reserve internally (it internally calls extend_with which calls reserve), so there is no need to also call child_pos.reserve above
(also the rest of this crate just calls reserve so I think using try_reserve just here seems unecessary)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
addressed in. df9e710
| fields.len() | ||
| ))); | ||
| } | ||
| child_pos[entry_idx * row_count + row] = cur_idx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 this is a nice way to avoid allocations
| let start = field_idx * row_count; | ||
| let end = start + row_count; | ||
| let pos = &child_pos[start..end]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to extract the field_tape_positions into another private struct that expose the api and hide the implementation detail
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored in. ad44ec2
| fields.len() | ||
| ))); | ||
| } | ||
| child_pos[entry_idx * row_count + row] = cur_idx; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can also be part of the dedicated struct
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored in ad44ec2
…exing in StructArrayDecoder
…ecoders and field index creation
…ecoder for improved clarity and performance
…der for improved clarity and performance
450e491 to
ad44ec2
Compare
|
run benchmark json-reader |
|
🤖 |
alamb
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
🤖: Benchmark completed Details
|
|
run benchmark json-reader |
|
🤖 |
|
🤔 #9086 (comment) shows some slowdowns. I will rerun and see if I can reproduce |
|
🤖: Benchmark completed Details
|
|
🤔 these look like they might be a bit slower now |
…p) (apache#9086) # Which issue does this PR close? <!-- We generally require a GitHub issue to be filed for all bug fixes and enhancements and this helps us generate change logs for our releases. You can link an issue to this PR using the GitHub syntax. --> - Closes #NNN. # Rationale for this change Optimize JSON struct decoding on wide objects by reducing per-row allocations and repeated field lookups. <!-- Why are you proposing this change? If this is already explained clearly in the issue then this section is not needed. Explaining clearly why changes are proposed helps reviewers understand your changes and offer better suggestions for fixes. --> # What changes are included in this PR? Reuse a flat child-position buffer in `StructArrayDecoder` and add an optional field-name index for object mode. Skip building the field-name index for list mode; add overflow/allocation checks. ``` decode_wide_object_i64_json time: [11.828 ms 11.865 ms 11.905 ms] change: [−67.828% −67.378% −67.008%] (p = 0.00 < 0.05) Performance has improved. Found 3 outliers among 100 measurements (3.00%) 2 (2.00%) high mild 1 (1.00%) high severe decode_wide_object_i64_serialize time: [7.6923 ms 7.7402 ms 7.7906 ms] change: [−75.652% −75.483% −75.331%] (p = 0.00 < 0.05) Performance has improved. Found 1 outliers among 100 measurements (1.00%) 1 (1.00%) high mild ``` <!-- There is no need to duplicate the description in the issue here but it is sometimes worth providing a summary of the individual changes in this PR. --> # Are these changes tested? Yes <!-- We typically require tests for all PRs in order to: 1. Prevent the code from being accidentally broken by subsequent changes 2. Serve as another way to document the expected behavior of the code If tests are not included in your PR, please explain why (for example, are they covered by existing tests)? --> # Are there any user-facing changes? No <!-- If there are user-facing changes then we may require documentation to be updated before approving the PR. If there are any breaking changes to public APIs, please call them out. -->
Which issue does this PR close?
Rationale for this change
Optimize JSON struct decoding on wide objects by reducing per-row allocations and repeated field lookups.
What changes are included in this PR?
Reuse a flat child-position buffer in
StructArrayDecoderand add an optional field-name index for object mode.Skip building the field-name index for list mode; add overflow/allocation checks.
Are these changes tested?
Yes
Are there any user-facing changes?
No